Shengyuu

@_01X9rimQmWH33Djf8QhoA

Joined on Mar 4, 2018

  • Convlusion Cl Codes vstore2 and vload2 function __kernel void default_function_kernel0(__global float* restrict A, __global float* restrict W, __global float* restrict B) { float B_local[64]; __local float Apad_shared[8]; __local float W_shared[128]; float Apad_shared_local[8]; float W_shared_local[8]; for (int ff_c_init = 0; ff_c_init < 4; ++ff_c_init) {
     Like  Bookmark
  • 解析紅黑樹( RED-BLACK TREE ) 概述 RBtree 是一種特別的平衡樹,既然身為平衡樹它在新增、刪除、搜尋的複雜度都能維持在 O(log n),而它的特別之處在於樹高 h 的定義與一般平衡樹不同(用的是 black height ),此特別之處讓紅黑樹在重新平衡所需的"代價"較其他平衡樹小 :::info 黑高度( black height) bh(x):從 x 到任何一個它的子孫葉子 leaf 的路徑上,遇到的 black node 個數(不包含自己) ::: 定義
     Like  Bookmark
  • Simulation Platform: https://drive.google.com/file/d/16xIMCTcmbXTVWEEe6Ru0KnbB37kOYD_j/view?usp=sharing Architecture Overview Manual Password Ubuntu password: caslabgpu
     Like  Bookmark
  • REQUIREMENTS The processor core operating speed is targeting at least 75 MHz for post-synthesized netlist. Its instruction set shall have at least 45 instructions, including branch, I/O instructions. IM + DM cannot be larger than 320KB The silicon area of the CPU+ICache+DCache+IM+DM shall be confined within 110 mm2 in total EPU shall be synthesized and constrained within 3 mm2 The kernel of the chip shall be less than 120 mm2 The read/write access time of an off-chip not-synthesized memory, usually DRAM, is 60ns The main memory only has one read/write port with bit width of 32
     Like 2 Bookmark
  • 標題: 前瞻 GPGPU XXXXXXX (待定) 目標: 完成 GPGPU IP Design 內容 目標介紹時可以先簡介目前市面上現有技術 (Ex: NVIDIA, AMD),再來介紹我們要達成的目標以及實際上要如何達成,不用介紹到很 detail ,但大方向和如何實作要解釋的合理且看得懂,多附圖說明 orz 現有技術介紹很多文件可以參考,整理排版一下複製貼上就好 能加的幹話可以加多一點 目標介紹篇幅要長一點,一個項目約 3~5 頁 word 請大家在 ==12/7== 前給我初版
     Like  Bookmark
  • Note 在 systemc2.3.2 版本 (2.3.3感覺沒問題) , dma.h 宣告的 socket 的順序,在建構子初始化的順序需要相同,否則會有 error b_transport 需要加 wait() ,否則沒辦法寫進 control reg
     Like  Bookmark
  • Week 1 (Introductiion & the Cool Programming Language) Structure of compiler Lexical Analysis (詞彙分析) 空白鍵為 token 之一 Parsing (語法分析) 將一段程式碼分成 parsing tree
     Like 1 Bookmark
  • TCP TCP (Transmission Control Protocol) 使用wireshark分析tcpdump出来的pcap文件 TCP 三次交握 實驗八 TCP協定分析 DDos
     Like  Bookmark
  • 論文架構 Abstract INTRODUCTION BACKGROUND AND MOTIVATION 2.1 Sparsity-Centric Optimization for DNNs 2.2 Existing Sparsifying Methods on GPUs 2.3 The Characterization of Sparsity
     Like  Bookmark
  • 1. 製作 Ubuntu 開機隨身碟 下載 ubuntu 18.04.03 ISO image ubuntu 18.04.03 下載 balenaEtcher ,以用來製作開機隨身碟 balenaEtcher
     Like  Bookmark
  • EASY 1. Hello_CTF 解題思路 這題應該是要告訴我們密碼的格式都是EENS{XXX} ,答案就寫在題目上 2. You can't type 問題分析 這個題目提供一個 korea.PNG 檔案給我們下載,下載下來後將圖片打開後會發現密碼就寫在圖片中,但密碼中包含韓文,大部分的同學不認得韓文且又不能直接複製中的文字。 解題思路
     Like  Bookmark
  • 使用 valgrind 效能分析工具分析 lenet , yolov3-tiny 在 MDFI 以及 Darknet 所消耗 heap 的比較 指令參考 5.3.3. 使用 MASSIF 側寫堆積與堆疊空間 Darknet for yolov3-tiny 在 makefile 中要把 DEBUG 參數改 1 並重新 make 執行 darkent 並產生 massif.out
     Like  Bookmark
  • Outline AI 課程 計算機組織實驗 ARC Contest Code generation AI 課程 第一周
     Like  Bookmark
  • # 論文 ## ISCA 2018 ### GPUs - [ ] [RegMutex: Inter-Warp GPU Register Time-Sharing](https://drive.google.com/open?id=1hXG25sjvY-7Ldf6Q5GlkNwoksV450PtI) - share a subset of physical registers between warps during the GPU kernel execution. >warp: 是 CUDA 中,每一個 SM 執行的最小單位 >SM: Streaming Multiprocessor >參考: [CUDA 的 Threading:Block 和 Grid 的設定與 Warp](https://kheresy.wordpress.com/2008/07/09/cuda-%E7%9A%84-threading%EF%BC%9Ablock-%E5%92%8C-grid-%E7%9A%84%E8%A8%AD%E5%AE%9A%E8%
     Like 1 Bookmark
  • # Assignment3: SoftCPU ## Modify the assembly programs for Reindeer Simulation with Verilator. ### A new Makefile In order to run the assembly program with Verilator, we have to generate the `.elf` file, and we can find some clues in the `Makefile` of `riscv-compliance` for generating `.elf` file. There are the messages when i command `make` in `riscv-compliance` folder. ![](https://i.imgur.com/CM1Plcb.png) In this picture we can find some path of libraries and `link` file we need when compi
     Like  Bookmark
  • # Assignment2: RISC-V Toolchain ## 1.pick up two assembly programs from [Assignment1: RISC-V Assembly and Instruction Pipeline](https://hackmd.io/ZHsZ-7HBTvqtH8SoDHxQoQ) and rewrite into C implementations which can execute with [rv32emu](https://github.com/sysprog21/rv32emu). ### Bit Reverse Giving an input number and reverse every bit of the number. #### C code for emu-rv32 ```cpp= void _start() { volatile char* tx = (volatile char*) 0x40002000; unsigned int example = 0x12345678
     Like 1 Bookmark
  • # Assignment1: RISC-V Assembly and Instruction Pipeline -- Bubble Sort contributed by < `Shengyuu` > ## C code Sort integer array `arr[]` which includes 5 integer with bublle sort algorithm ```cpp #include <stdio.h> int main() { int arr[5]; arr[0] = 3; arr[1] = 5; arr[2] = 1; arr[3] = 2; arr[4] = 4; int tmp; for(int i = 0; i < 4; i++){ for(int j = 0; j < 4 - i; j++){ if(arr[j] > arr[j + 1]){ tmp = arr[j];
     Like  Bookmark
  • # 2019q3 Homework2 (lab0) contributed by < `Shengyuu` > ## 說明 - [作業要求](https://hackmd.io/s/BJA8EgFB4) - [C Programming Lab](http://www.cs.cmu.edu/~213/labs/cprogramminglab.pdf) ## 實驗環境 ```shell $ uname -a Linux yao 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linu $ gcc --version gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 ``` ## 作業需求 ### 實作 FIFO & LIFO Queue - 要求 - `q_new`: 新增一空的 queue - `q_free`: 清除 queue - `q_insert_head`: 在 queue 的頭插入一元
     Like  Bookmark
  • # 2019q3 Homework1 (review) contributed by < `Shengyuu` > ## 題組 1 考慮以下 C 程式的 align4 巨集的作用是,針對 4-byte alignment,找到給定地址的 round up alignment address。 ```cpp #include <stdio.h> #define align4(x) (((x) + K) & (-4)) int main(void) { int p = 0x1997; printf("align4(p) is %08x\n", align4(p)); return 0; } ``` 預期程式輸出 `align4(p) is 00001998` ### 解題思路 題目中 4-byte-alignment 的意思是要將給定的地址做 align ,讓地址永遠維持 4 的倍數,要維持 4 的倍數有兩種可能,分別是 `round up` 以及 `round down` ,例如:將題目中的 `p = 0x1997` align 成 `0x1998` 及為
     Like  Bookmark
  • # Code Generation ## 定義 Definition - What does Code Generation mean? Code generation is a mechanism where a compiler takes the source code as an input and converts it into machine code. This machine code is actually executed by the system. **Code generation is generally considered the last phase of compilation, although there are multiple intermediate steps performed before the final executable is produced.** These intermediate steps are used to perform optimization and other relevant processe
     Like 2 Bookmark