陳彥甫 - HackMD

實作 Thread package
contributed by < qwe661234 > GitHub: MuThread 以第 10 週測驗題給定的程式碼為基礎，實作類似 POSIX Thread 的套件 Why futex? 論文 "Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux" by Franke, Russell, Kirkwood. Published in 2002 for the Ottawa Linux Symposium. 提到 In traditional UNIX systems, System V IPC (inter process communication) such as semaphores, msgqueues, sockets and the file locking mechanism (flock()) are the basic mechanisms for two processes to synchronize. These mechanisms expose an opaque handle to a kernel object that naturally provides the shared state and atomic operations in the kernel. Services must be requested through system calls (e.g., semop()). The drawback of this approach is that every lock access requires a system call. When locks have low contention rates, the system call can constitute a significant overhead.
陳彥甫 changed 2 years agoView mode Like 3 Bookmark
Cache Coherency & I/O ordering
Cache Coherency 在多處理器的機器上，每個 CPU 都有自己的 cache，以彌補 CPU 和主記憶體(Main memory) 之間較慢的存取速度，當一個特定資料首次被特定的 CPU 讀取時，此資料顯然不存在於該 CPU 的 cache 中 (即 cache miss)，意味著 CPU 需要自主記憶體中讀取資料 (為此，CPU 通常要等待數百個 cycle)，此資料最終將會存放於 CPU 的 cache 中，這樣後續就能直接自 cache 上快速存取，不需要再透過主記憶體。因為每個 CPU 上的 cache 資料是 private 的，因此在進行寫入操作時，會造成其他 CPU 上的 cache 資料不一致的問題。 e.g. 取自 MIT 6.004 L25: Cache Coherence Core0 load memory from address 0xA，資料內容為 2 Core2 store memory to address 0xA，更新資料在自己的 cache 上，並在稍後補回 memory，而更新內容為 3。 Core2 的更新 Core0 並不知道，因此 Core0 還以為 0xA 上所存的資料是 2，但顯然這份資料已經過時。
陳彥甫 changed 2 years agoView mode Like Bookmark
CO Final
(15pt) 1. Assume there is no forwarding in a 5-stage pipline processor 1 (1) (5pt) Please indicate all the potential hazards in the following instruction sequence ? xori x1, x1, 2 addi x6, x5, -4 lw x10, 8(x6) add x5, x6, x6 sub, x2, x2, x1 Ans: addi && lw (2.5pt)
陳彥甫 changed 2 years agoView mode Like Bookmark
New EBB Strategy
考量到 JIT 中 codegen 以及 compilation 的 overhead 過高，我們需要讓一個 EBB 含有更多的指令，理想是希望將一整個 loop body 涵蓋在一個 EBB 中。我們參考 rvemu， rvemu 先將 block 以 unconditional jump 作切分，接著將其 block 中所有 branch taken 會走到的 block 都涵蓋進來，一個 block 含有大量的指令且有好的效能表現。我們的新策略也是希望一個 block 中含有大量的指令，所以我們 trace 整個 control flow graph 並把 trace 過的 block 都涵蓋進來。 control flow graph 的終點有兩種情形，一種是 unconditional jump，而另一個則是遇到 back edge，也就是指向已經 trace 過的節點，下圖展示一個簡單的 EBB 案例。過去的策略我們是直接指向某個 block 的 ir，一旦 block 被 cache 置換掉，EBB 就會變小。過去的另一個策略是用 memcpy，然而頻繁的 memcpy 成本過高，導致效能沒有很好的提昇。於是我們選用成本較低的 instruction fetch & decode，將 trace 到的 block 使用 instruction fetch & decode 存到 EBB 中，這個作法也不用擔心其他 block 被 cache 置換掉使 EBB 變小的問題。另外，過去 EBB 有兩種版本，一種是 for interpter 而另一種是 for JIT 的，這導致作 JIT 時還要重新 extend 一次，現在則統一一種版本。 original EBB vs new EBB 指令數量
 Jim Huang changed 2 years agoView mode Like Bookmark
Extend basic block based on Control Flow Graph
按照 EBB 的定義，single entry and multiple exit，我們必須記錄 block 的進入點，如果該 block 的進入點有多個，就表示這個 block 不能 merge 到其他 block 之下，因為 merge 後這個 EBB 會變成有多個進入點而違反 EBB 的定義。記錄方式，先紀錄第一個 previous block 的 program counter，如果之後紀錄的 previous block 的 program counter 不同，那就代表有多個進入點，就把 can_merge 設定為 false. if (prev && next->can_merge) { if (!next->prev_pc) { next->prev_pc = prev->pc_start; if (prev->left) prev->right = next; else
陳彥甫 changed 2 years agoView mode Like Bookmark
2022q1 Homework1 (quiz1)
contributed by < qwe661234 > 測驗題目測驗一 Two Sum Hash Table Structure hash table bucket 以 non-circular doubly-linked list 來儲存 hlist_head: 指向 bucket list 的第一個節點
 Angus ku changed 2 years agoView mode Like Bookmark
Lab1: RV32I Simulator
contributed by < qwe661234 > GitHub: ComputerArchitectureHW/HW1-LeetCode-203-RISCV 203. Remove Linked List Elements This problem is based on LeetCode 203. Given the head of a linked list and an integer variable val, remove all the nodes of the linked list that has Node.val == val, and return the new head. ListNode Structure struct ListNode { int val;
陳彥甫 changed 2 years agoView mode Like Bookmark
2023q1 Homework3 (fibdrv)
contributed by < qwe661234 > 作業需求 GitHub ::: spoiler 實驗環境 $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
陳彥甫 changed 2 years agoView mode Like Bookmark
2023q1 Homework1 (lab0)
contributed by < qwe661234 > Implement queue.c queue structure Doubly-linked list and its value is string. struct list_head { struct list_head *prev; struct list_head *next; };
陳彥甫 changed 2 years agoView mode Like Bookmark
研究相關資料
實驗不同 Dispatch 機制 Tail Call Optimization wasm3 - a high performance WebAssembly interpreter with TCO Building the fastest Lua interpreter.. automatically! Parsing Protobuf at 2+GB/s: How I Learned To Love Tail Calls in C Re: Suggestions on implementing an efficient instruction set simulator in LuaJIT2 Code-copying Optimizing software-hardware interplay in efficient virtual machines interp - Testing interpreter dispatch methodsSwitching
陳彥甫 changed 2 years agoView mode Like Bookmark
gptcommit
GitHub 1. Install cargo $ curl https://sh.rustup.rs -sSf | sh $ source "$HOME/.cargo/env" 2. Check your rustc version >= 1.67.0 $ rustc --version 3. Install OpenSSL $ sudo apt-get install pkg-config libssl-dev 4. Install gptcommit
陳彥甫 changed 2 years agoView mode Like Bookmark
Code-copying Dispatch
Code-copying 動機 Code-copying 主要是為了降低一個 basic block 中的 instruction dispatch overhead 來勁一部提升 direct-threading 的表現. 實作概念就是將模擬各個指令之程式碼的起始記憶體位址與終止記憶體位址紀錄於陣列中，程式碼其實就是一串 machine code，接著建立一個可讀寫的記憶體空間，按照 basic block 中的指令順序，一一將指令之程式碼寫入該記憶體空間。執行時就直接執行該記憶體空間中的一連串 machine code。通過 code-copying 來的第一個好處就是減少 jump 指令次數，而且出現在這些 copying code 中的 jump 的指令更好預測。另一個好處則和 cache locality 有關，過往在非連續的記憶體位址中調用某一指令之程式碼，由於 cache 大小有限，當兩段程式碼之記憶體位址格的太遠時，就會發生 cache miss。然而，在 code-copying 機制中，一連串被調用的指令程式碼記憶體位址都是連續的，這樣的情況大大降低 cache miss 的機率。 Code-copying 實作 1. 建立 memory page 在 decode 後可以得知 basic block 中含有幾道指令，並以此算出這個 basic block 需要的準備幾個 memory page 來將指令程式碼複製進去。
陳彥甫 changed 2 years agoView mode Like Bookmark
Reduce rv32emu Dispatch Overhead
contributed by < qwe661234 > Repository: rv32emu Environment setup Clone the repository$ git clone https://github.com/sysprog21/rv32emu.git Follow the instructions of rv32emu Baseline: commit 3305664
陳彥甫 changed 2 years agoView mode Like Bookmark
Web49
Model: Core i7-8700
陳彥甫 changed 2 years agoView mode Like Bookmark
資訊科技產業專案設計作業4
268. Missing Number 😈interviewer: 潔西卡 😬interviewee: 魏昇芷題目敘述 😈: There is an issue in the company. We store every employee's information in dataBase and every employee has unique ID. This ID is an integer. Because the number of employee is n + 1, so the range of ID is from 0 to n. The issue is that we lost one employee's information in database, we want to find out which employee's information is lost. 😈: I further simplize the issue. These employees' ID store in an array. For example, if the array is [0, 1, 2, 3, 4, 6], you should tell me the missing employee's information whose ID is 5. The range of number in this array is from 0 to size of the array and all the number is integer. So, if this issue is dispatched to you, what is your solution? 參考解答
 陳彥甫 changed 2 years agoView mode Like Bookmark
wasm3 Strategy
clang compile 這邊要用 clang-14 才支援 musttail。 mkdir build cd build cmake -GNinja -DCLANG_SUFFIX="-14" .. ninja Tail Call 透過 gef 追蹤得知，程式主要由 main -> repl_call -> m3_CallArgv->op_Call->op_Compile->Compile_funciton->...，function dispatch 主要透過 op_Call 這個 function 完成，op_Call 這個 function 是由 d_m3Op (Call) 所展開，而 d_m3Op (Call) 最原始是由 RunCode 所展開，由程式碼可以發現 RunCode 的寫法符合 tail-call 的定義。然而，參考這篇 Parsing Protobuf at 2+GB/s: How I Learned To Love Tail Calls in C 發現 __attribute__((musttail)) 這個 attribute 目前移植性並不高，當我直接定義 M3_MUSTTAIL 為__attribute__((musttail)) 時，我的電腦並不支援。
陳彥甫 changed 2 years agoView mode Like Bookmark
rv32emu 加速規劃
參考 ria-jit IR 格式 { instruction name, rs1, rs2, rd, h1, h2, imm} * mnem : mnem * rs1, rs2, rd : * rs1_h1 means, that rs1 of the instruction * has to be the same as in the instruction at * the position stored in h1. (block_cache[pattern_start + h1]).
陳彥甫 changed 2 years agoView mode Like Bookmark
Homework3: SoftCPU
Setup Follow the instruction of Lab2: RISC-V RV32I[MACF] emulator with ELF support and srv32 - RISCV RV32IM Soft CPU to install riscv toolchain and srv32. Run ISS and RTL Simulator Follow the dirtectory structure of srv32/sw/hello to create another dirtectory hw3 under srv32/sw. There are two files in this directory, hw3.c and Makefile. hw3_isLongPressedName.c #include <stdio.h> #include <string.h>
陳彥甫 changed 2 years agoView mode Like Bookmark
Tail-call Optimization
如果一個函式 (function) 的最後行為是呼叫函式，我們稱這個最後行為為 tail call。舉個例子，考慮到以下程式碼，函式 A 的最後行為是執行函式 B int A(int data) { return B(data); } 要注意的是， tail call 的最後行為只允許呼叫函式，所以以下的程式碼就不是 tail call，因為他的最後行為是進行 2 * 1 + B(data) 的運算，而非單純呼叫函式。
陳彥甫 changed 2 years agoView mode Like 5 Bookmark
資訊科技產業專案設計作業3
contributed by <魏昇芷 - tissue> NVIDIA - Embedded Software Engineer What we need to see: Experience in relevant domain. Good English language skills to work effectively with global teams. Full experience at Linux, QNX or Android. Excellent C skills. Experience working on embedded systems and ARM processor specific.
陳彥甫 changed 2 years agoView mode Like Bookmark