資訊科技產業專案設計課程作業 3

# 資訊科技產業專案設計課程作業 3 [Resume](https://drive.google.com/file/d/1Ir5n2tpYE-WH85MlmCM-6K0AhSRB4yr4/view?usp=drive_link) [Github](https://github.com/RBing123) ## 相關工作職缺項目 ### [Qualcomm](https://careers.qualcomm.com/careers) :::spoiler Job Description #### [CPU Architecture Performance Engineer](https://careers.qualcomm.com/careers?location=Hsinchu%20City%2C%20Hsinchu%20City%2C%20Taiwan&pid=446700807875&domain=qualcomm.com&sort_by=relevance) ++**General Summary**++: **The Potential Areas To Work Include** * The main responsibility of this position is to do the performance verification for world-class custom CPU for mobile and portable computers. **Roles and Responsibilities** * Proficiency in one or more areas of CPU architecture: fetch, decode, branch prediction, renaming, execute units, SIMD, load/store, MMU, caches, retire, etc. * Verify performance feature between RTL and model, and have ability to troubleshooting * Work with design team and performance team to develop test case and validate new feature **Preferred qualifications** * MS degree in Computer Architecture with 3-5 years of practical experience * Good CPU architecture knowledge and micro-architecture knowledge * Experience working in a RTL simulation environment * Experience working in a performance modeling environment * Proficient in Verilog, C and C++ and scripting languages such as Perl or Python * Ability to problem solve and prove your own ideas * Knowledge and experience with common performance benchmarks and workloads #### 匹配程度本身具有CPU架構的開發經驗，包含`Fetch`, `decode`等，但缺少關於`MMU`、`cache`的實作經驗以及沒有`RTL`相關的背景知識，此外，熟悉C/C++以及Python腳本撰寫。 ::: ### [AMD](https://careers.amd.com/careers-home) :::spoiler Job Description #### [GPU Architectural Modeling Engineer](https://careers.amd.com/careers-home/jobs/54158?lang=en-us) As a GPU Modeling Architect, you will work in a hardware/software co-design environment to craft industry leading simulation platforms and tools, have a strong voice in next generation of AMD products, and get to learn and contribute to the entire graphics pipeline and software/hardware stack. The ideal candidate is a highly skilled software designer and engineer, who is proficient in C++, and familiar with hardware architecture. Knowledge of the GPU hardware and the efficient use of discrete GPUs, APUs, and mobile devices is an advantage. The candidate will work in a big, multi-geo team environment on a variety of opportunities, so teamwork and drive are essential. **KEY RESPONSIBILITIES**: * Model and analyze hardware and software features to improve products performance, abilities, and cost. * Develop tools and algorithms to identify and eliminate performance limiters * Design, implement, assess, document and test new features for next-gen products * Be a developer, leader, and owner of the Architectural Modeling Tool **PREFERRED EXPERIENCE**: * Programming skills in C++ and python * Hardware architecture experience * Experience of modeling and simulation tools * Experience of C++ build tools such as CMake, Meson, Bazel * Experience in a Linux-based environment is a plus * Experience of GPU architecture, drivers and/or compilers is a plus * Experience of GPU general-purpose computing is a plus * Experience of D3D, Vulcan and OpenGL graphics applications is a plus #### 匹配程度 * 熟悉C++和Python程式撰寫 * 了解硬體架構，並且有十座RISC-V架構CPU的經驗，以及相關模擬器的使用，如: `Venus`、`Ripes` * 有大量OpenGL的開發經驗，並且有實作相關專案 * 缺少GPU的背景知識以及實作經驗 * 缺少Compiler的實作經驗 ::: ### [Realtek](https://recruit.realtek.com/Job/Search) :::spoiler Job Description #### [AI/ML/GPT模型架構師](https://recruit.realtek.com/Job/JobDetail?jobid=1937) > [相關面試經驗](https://hackmd.io/@accdlab/HkBANw4PP) 工作項目： 1. 設計與訓練CNN/GNN/Transformer架構的神經網路模型 2. 針對不同運算平台進行神經網路模型的優化 3. 開發生成式AI (GenAI)相關的網路模型與應用應徵條件： 1. 碩士以上；電子、電機、資工、通訊、電腦科學、數學統計相關科系畢業為主。 2. 熟悉C/C++/Python/Java, PyTorch/TensorFlow, ARM & RISC-V Assembly/Intrinsic/SIMD, Microsoft NNI or Alibaba TinyNAS相關工具 3. 具備數位訊號處理、機器/電腦視覺、機器/深度學習、平行運算、多模態融合等相關領域的知識 #### 匹配程度 * 熟悉C/C++以及Python * 透過Pytorch或keras開發生成式AI(如 : GAN、Diffusion model)，以及Transformer的訓練經驗 * 熟悉RISC-V Assembly，並且有RISC-V架構CPU的實作經驗 * 熟悉電腦視覺以及深度學習 * 缺少平行運算和數位訊號處理的經驗 * 缺少多模態融合相關知識 ::: ### [Novatek](https://www.104.com.tw/company/12nopku0?jobsource=cs_2018indexpoc) :::spoiler Job Description #### [AI影像視訊處理研發工程師(I1)](https://www.104.com.tw/job/66uof?jobsource=cs_2018indexpoc) 【工作說明】本職缺內容主要為研究與開發視訊處理與電腦視覺相關的演算法或硬體架構。利用深度學習或其他先進的演算法，改良現有的視訊處理效果與開發更先進的影像視訊處理方法，並且結合演算法與硬體知識，進一步最佳化深度學習的效能。 1. 利用Neural network來處理影像視訊的優化 2. 收集dataset來訓練網路 3. 最佳化model來符合硬體架構實現 4. DLA的架構規劃與設計 5. DLA相關的開發工具, 網路驗證, 與最佳化工具【必要條件】 1. MS/Ph.D. in CS/EE (or related major) 2. 熟悉深度學習的概念, 演算法與架構 3. 具深度學習工具的使用經驗. (ex. python, tensorflow, pytorch, caffe,....) #### 匹配程度 * 熟悉DL相關背景知識以及實作相關專案 * 具有Python, Pytorch的使用經驗以及對應的專案 * 缺少DLA(Deep Learning Accelerator)的使用經驗以及背景知識 ::: ## 分析 ### AI/GPU Job * 對於GPU相關的職缺來說，有RTL的經驗或者Compiler的經驗更為重要 * ML/DL 在大部分的Job Description來說更像是一個加分項 * 如果僅具備軟體能力，而缺乏硬體性能調整和最佳化的經驗，可能需要補充學習 ### Self-assessment * 具備 CG/ML/DL 相關背景，並且開發過相關專案，如: `Image Retargeting`, `Real-time Rendering for Transformer visual system` 等OpenGL pipeline * 熟悉電腦視覺以及生成式AI (Diffusion model, GAN)，並且有相關專案 * 具有開發RISC-V架構CPU的經驗，以及Computer Architecture的背景知識 * 缺乏DLA (Deep Learning Accelerator)的使用經驗以及背景知識 * 缺乏 RTL 相關的背景知識以及實作經驗 * 缺乏 GPU 的實作經驗 ## 模擬面試 > :man_in_tuxedo: : Interviewer > :dog: : Interviewee ### 1. Qualcomm :man_in_tuxedo: : 在分散式計算環境中，系統需要模擬大數運算以處理任意精度的計算任務。請實現一個函式，接受兩個以字串形式表示的浮點數，並返回它們的和。假設數字可能非常大，無法直接用內建的 float 或 double 類型表示。 :dog: : 字串形式的數字是否會包含負數或小數點？ :man_in_tuxedo: : 字串會包含負數跟小數點 :dog: : 如何處理空字串或非法輸入，如輸入 abc？ :man_in_tuxedo: : 直接回傳NULL就好 :dog: : 我返回的型態應該要是字串還是float/double的型態 :man_in_tuxedo: : 字串的型態 :dog: : 那我舉個例子，假設現在有兩個字串分別為 ``` string 1: 1.2 string 2: -2.0 ``` 經過函式計算後，我應該返回-0.8，那如果現在有另一筆輸入 ``` string 1: 0.2 string 2: abc ``` 我應該直接返回NULL，這樣是對的嗎 :man_in_tuxedo: : 對的 :dog: : 那我的想法應該會是把兩個字串的整數跟小數部分拆開，之後把整數跟小數部分補齊到相同長度，之後從小數部分的最低位開始相加，整數亦然，並且都需記錄進位，最後將兩者組合起來 :man_in_tuxedo: : 聽起來沒有問題，但要注意正負號相加的問題 :dog: : 那我的演算法如下， ``` void padding_r(char *str, int length){ int len = strlen(str); for(int i = len; i<length; i--){ str[i]='0'; } str[length]='\0'; } void padding_l(char *str, int length){ int len = strlen(str); if(len<length){ memmove(str + (length - len), str, len + 1); for (int i = 0; i < length - len; i++) { str[i] = '0'; } } } char *add_f(const char *num1, const char *num2){ char int1[100], frac1[100]; char int2[100], frac2[100]; sscanf(num1, "%[^.].%s", int1, frac1); sscanf(num2, "%[^.].%s", int2, frac2); // pad mantissa int frac_len1 = strlen(frac1); int frac_len2 = strlen(frac2); int max_frac_len = frac_len1 > frac_len2 ? frac_len1 : frac_len2; pad_right(frac1, max_frac_len); pad_right(frac2, max_frac_len); // pad integer int int_len1 = strlen(int1); int int_len2 = strlen(int2); int max_int_len = int_len1 > int_len2 ? int_len1 : int_len2; pad_left(int1, max_int_len); pad_left(int2, max_int_len); int carry = 0; char frac_result[101] = {0}; for (int i = max_frac_len - 1; i >= 0; i--) { int sum = (frac1[i] - '0') + (frac2[i] - '0') + carry; frac_result[i] = (sum % 10) + '0'; carry = sum / 10; } char int_result[101] = {0}; for (int i = max_int_len - 1; i >= 0; i--) { int sum = (int1[i] - '0') + (int2[i] - '0') + carry; int_result[i] = (sum % 10) + '0'; carry = sum / 10; } // carry char final_result[202] = {0}; if (carry) { sprintf(final_result, "1%s.%s", int_result, frac_result); } else { sprintf(final_result, "%s.%s", int_result, frac_result); } return strdup(final_result); } ``` :man_in_tuxedo: : 分析你的時間和空間複雜度 :dog: : 時間複雜度為O(max(n, m))，n 和 m 分別是兩個數字字串的長度，空間複雜度為O(max(n, m)) ### 2. AMD :man_in_tuxedo: : 在硬體中，記憶體通常是以層次結構（如多層快取或分層存儲）進行設計的。設計一個算法來確定最深的記憶體訪問層次（等價於二元樹的最大深度），以幫助分析最壞情況下的訪問時間。假設一個記憶體分層結構被建模為一個二元樹，其中每個節點代表一層記憶體存取。節點的左子樹和右子樹分別表示緩存命中和緩存未命中。請設計一個算法，計算該記憶體模型的最大訪問深度。 :dog: : 那root節點的深度是0還是1 ? :man_in_tuxedo: : 我們假設root節點的深度是0 :dog: 所以我應該找出這棵樹的最大深度，而這個深度就是該記憶體模型的最大訪問深度 :man_in_tuxedo: : 對的 :dog: : 所以說如果現在有一棵二元樹為，並且假設如果緩存有命中的話為T，沒命中的話為F ``` F / \ F T ``` 在這個情況下我應該返回1，那會有root節點不存在的情況嗎? :man_in_tuxedo: : 假設起始的root節點不會不存在 :dog: : 那我應該會使用recursive的方法來解決這問題，我們可以使用preorder的方式來走訪整棵樹，並且記錄它的深度 ``` typedef struct{ bool hit; treenode *left, *right; } treenode; int cal_depth(treenode *root, int depth){ if(!root) return depth; if(root->hit) depth+=1; int left_depth = cal_depth(root->left, depth); int right_depth = cal_depth(root->right, depth); return left_depth>=right_depth ? left_depth : right_depth; } int max_depth(treenode *root){ int depth = 0; return cal_depth(root, depth); } ``` :man_in_tuxedo: : 我覺得方法可以進行optimize，對於空間的使用上可以比較少 :dog: : 對，如果要做到空間上的最佳化，應該像是這樣 ``` typedef struct{ bool hit; treenode *left, *right; } treenode; int cal_depth(treenode *root, int cur_depth, int *max_d){ if(!root) return *max_d; if(root->hit && cur_depth > *max_d) *max_d = cur_depth; cal_depth(root->left, cur_depth+1, max_d); cal_depth(root->right, cur_depth+1, max_d); return *max_d; } int max_depth(treenode *root){ int depth = 0; return cal_depth(root, &depth); } ``` :man_in_tuxedo: : 那妳可以分析你程式的時間跟空間複雜度嗎? :dog: : 時間複雜度應該是O(n)，n為節點數，空間複雜度為O(h)，h為樹的高度 :man_in_tuxedo: : 你舉個例子來證明你的演算法有效 :dog: : OK，假設現在有一顆像是這樣 ``` miss / \ miss miss / \ miss hit ``` 如果在這個情況下，我的演算法應該會返回2 ### 3. Realtek :man_in_tuxedo: : 在 C 語言中，malloc 函數用於動態分配記憶體空間。請實現一個函式來分配一塊記憶體，該記憶體的大小為 n，並且地址必須對齊到指定的對齊值 k。 :dog: : 會提供Function Prototype嗎? :man_in_tuxedo: : 有的，Function Prototype為 ```c void* aligned_malloc(size_t n, size_t k); ``` :dog: : 對齊值 `k` 是否保證是 2 的次方？或者是否需要處理非 2 的次方的情況？ :man_in_tuxedo: : 是的，可以假設 `k` 是 2 的次方。 :dog: : 那麼需要考慮的記憶體分配失敗情況嗎？`malloc` 返回 `NULL` 時是否需要報錯或拋出異常？ :man_in_tuxedo: : 是的，你需要考慮 `malloc` 返回 `NULL` 的情況，因為這是一個通用的場景，特別是在內存受限的環境中。 :dog: : 好，那我想我可以舉兩個例子，分別是 `malloc` 成功跟 `malloc` 失敗的例子 :dog: : 1. `malloc` 成功的例子假設現在我們要分配 100 byte的空間，並且對齊到 16 byte。因此我們請求的空間需求 (size) 是100 byte，對齊要求 (alignment) 是16 byte，因此函式的返回地址應該要是16的倍數。所以我們分配 `size + alignment - 1 + sizeof(void *)`的空間，其中`sizeof(void *)` 是為了儲存原始pointer，這樣後面可以正確釋放記憶體。`alignment - 1` 是利用bitwise operation來計算對齊的起始位置，因為`alignment`是2的次方，因此會滿足`alignment & (alignment - 1)==0`的特性 2. `malloc` 失敗的例子假設現在我們要分配 0 byte的空間，並且對齊到 16 byte。那我們的函式應該返回 `NULL` :man_in_tuxedo: : 你的理解是對的 :dog: : 好，那我想我的方法應該會是這樣 ```c void *aligned_malloc(size_t size, size_t alignment){ if(alignment == 0 || (alignment & (alignment - 1))!=0) return NULL; void *original = malloc(size + alignment-1); if(original == NULL){ return NULL; } uintptr_t aligned_address = ((uintptr_t)original + alignment - 1) & ~(alignment - 1); return (void *)aligned_address; } ``` :man_in_tuxedo: : 簡單說明一下時間跟空間複雜度 :dog: : 這個方法的時間複雜度是O(1)，空間複雜度也是O(1)，那我舉個例子來測試我的方法是有效的。假設現在需要分配的空間是 100 byte，對齊的要求是 16 byte，程式一開始會先對於對齊的要求進行確認，確認是否為2的N次方。之後我們會去配置`size+alignment-1+sizeof(void *)`的空間，如果配置失敗的話會回傳 `NULL`，而我們配置的大小是100+16-1+4=119，並且假設我們`original`指向的位置是`0x1000`，那回傳的`aligned_address`就會等於 ```c aligned_address = (0x1000 + 16 - 1) & (16 - 1) = 0x1015 & ~0xF = 0x1015 & 0xFFFFFFF0 = 0x1010 ``` ### 4. Novatek :man_in_tuxedo: : 解釋 multi-thread, multi-process :dog: : Multi-thread為在同一個程序中同時執行多個執行緒，multi-thread的特點是共享相同的記憶體空間，並且建立和切換速度快，但一個執行緒崩潰可能影響整個程序。Multi-process為同時執行多個獨立的程序，它的特點是每個程序有獨立的記憶體空間，程序間通信需要特殊機制(如管道、共享記憶體)。 :man_in_tuxedo: : 解釋 deadlock，怎麼避免 :dog: : Deadlock是指兩個或更多執行緒/進程互相等待對方釋放資源，導致所有相關執行緒/進程都無法繼續執行的狀態。而deadlock發生的四個必要條件： 1. 互斥(Mutual Exclusion)：資源同一時間只能被一個執行緒使用 2. 持有並等待(Hold and Wait)：執行緒持有資源的同時等待其他資源 3. 不可搶奪(No Preemption)：資源不能被強制從執行緒中釋放 4. 循環等待(Circular Wait)：執行緒之間形成一個等待環可以使用固定順序獲取鎖、使用超時機制或使用更高級的同步機制來避免deadlock > 源自於[Operating system concepts](https://fjuedu-my.sharepoint.com/personal/406401484_m365_fju_edu_tw/_layouts/15/onedrive.aspx?id=%2Fpersonal%2F406401484%5Fm365%5Ffju%5Fedu%5Ftw%2FDocuments%2Fseanpeng12%40gmail%2Ecom%E9%9B%B2%E7%AB%AF%E7%A1%AC%E7%A2%9F%E5%82%99%E4%BB%BD%2F%E8%B3%87%E5%B7%A5%E6%89%80%E8%80%83%E8%A9%A6%2FSilberschatz%5F%2D%5FOperating%5FSystem%5FConcepts%5F10th%5FEdition%5Fc2018%5Ftxtbk%2Epdf&parent=%2Fpersonal%2F406401484%5Fm365%5Ffju%5Fedu%5Ftw%2FDocuments%2Fseanpeng12%40gmail%2Ecom%E9%9B%B2%E7%AB%AF%E7%A1%AC%E7%A2%9F%E5%82%99%E4%BB%BD%2F%E8%B3%87%E5%B7%A5%E6%89%80%E8%80%83%E8%A9%A6&ga=1) :man_in_tuxedo: : 解釋 preemptive :dog: : Preemptive是一種進程/線程的調度方式，允許作業系統暫停正在執行的進程，將CPU分配給其他具有更高優先級的進程。 ## 參考資料 * [新鮮人面試心得(ByteDance/Qualcomm)](https://www.ptt.cc/bbs/Soft_Job/M.1614699034.A.FD4.html) * [25家面試心得](https://www.ptt.cc/bbs/Tech_Job/M.1641731063.A.A3D.html) * [2021 ML / SWE 面試心得（文長）](https://www.ptt.cc/bbs/Soft_Job/M.1628227346.A.7C5.html) * [面試心得分享](https://www.ptt.cc/man/Tech_Job/DB04/D742/D49D/M.1565533358.A.719.html) * [2022面試心得](https://hackmd.io/@aPlFSlXZR3GJuqovYdN_qg/H1Y2YK6Di) * [[心得] 研替面試心得系列【5 聯詠 Novatek】](https://www.ptt.cc/bbs/Tech_Job/M.1449065803.A.D26.html)