COSCUP
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Help
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 以 HPC 開發者的視角,搭配 RISC-V 來理解現代處理器架構 {%hackmd @coscup/announcement-2025 %} > 張瑞甫 / 黃敬群 <br> > [slides](https://docs.google.com/presentation/d/1ZCfbFerY3CSJoK4tKXYY_LNo8ktW9_sJ5QGQxIMm6fs/edit?usp=sharing) ## HPC 為何與開放式中央處理器有關? * HPC(高效能運算)本質是「性能」(Performance)、「功耗」 (Power),和「可擴展性」(Scalability) 的平衡 * 圍繞在 RISC-V 的開放原始碼 SoC:低門檻動手驗證架構設計 * RISC-V:ISA 開放、社群活躍、支援向量擴展 (RVV),滿足 HPC 的科學計算與 AI 工作負載 * EPI ([European Processor Initiative](https://www.european-processor-initiative.eu/)) 計畫用 RISC-V 控制核協調 HPC 運算核,展現「HPC + 開放原始碼」的可行性 > 10% 的超級電腦用於高頻交易在內的金融運算 $\to$ [高頻交易樹大招風](https://www.clementi1962.com/words878.html) ### 技術與應用的契合點 * HPC 工作負載差異極大(CFD 與 AI 推論對記憶體與運算單元需求完全不同),開放原始碼 CPU 架構(如 RISC-V)允許針對特定工作負載調整: * 增加/刪減指令集擴展(RVV、Bitmanip、Crypto) * 修改快取階層、記憶體通道數、NoC 拓撲 * 加入專用加速器(FFT、矩陣乘法、Sparse Kernel) * HPC 領域的開放原始碼硬體社群活躍(如 [CHIPS Alliance](https://chipsalliance.org/)、[OpenHW Group](https://www.openhwgroup.org/)),可快速引入: * 新指令集擴展草案 * 最新記憶體與互連協定(如 CHI、UCIe) * 與主流 EDA/FPGA 平台整合的測試平台 > HPC 單核性能不一定好,甚至可能輸給手機上的 CPU 工作負載: * CFD([Computational Fluid Dynamics](https://en.wikipedia.org/wiki/Computational_fluid_dynamics), 計算流體力學) * 使用有限元素法(FEM)、有限差分法(FDM)、有限體積法(FVM)等數值方法 * 涉及大規模稀疏矩陣運算,矩陣規模可達數百萬階 * 記憶體存取模式不規則(irregular access pattern),且需要頻繁的鄰近元素資料交換 * 資源需求: - 高記憶體頻寬(Memory Bandwidth)→ 保證資料能及時送到運算單元 - 低記憶體延遲(Memory Latency)→ 減少等待資料時間。 - 中等或偏低的浮點運算密度(FLOPs/Byte)。 * AI 推論(Inference),以 CNN / Transformer 為例 * 大量密集矩陣乘法(Dense Matrix Multiply, GEMM) * 高度可平行化的運算 (SIMD / SIMT) * 權重(Weights)可藉由分段快取或 SRAM 中,降低 DRAM 存取 * 資源需求: - 高運算單元吞吐量(High Compute Throughput),尤其是向量/矩陣乘法單元 - 高運算密度(FLOPs/Byte 高)→ 更依賴計算單元,而非頻寬 - 大容量的快取或 on-chip buffer,降低外部記憶體壓力 ## HPC 的性能瓶頸與設計挑戰 * 單核 IPC 接近物理極限 * HPC 系統性能常被記憶體頻寬、NoC latency、快取一致性拖慢 * 功耗與面積限制決定 PPA(Performance/Power/Area)取捨 * STREAM 效能評比: * FPGA + DDR5 (51.2 GB/s) → STREAM 利用率 70% * FPGA + HBM2E (460 GB/s) → STREAM 利用率 88% * CPU 處理量翻倍 vs 頻寬不變 → 效能增幅 < 10% > HPC 是整體的規劃,涵蓋 HW 跟 SW > benchmark for HPC: [LINPACK](https://top500.org/project/linpack/), STREAM - [ ] 效能分析 * FPGA + DDR5(51.2 GB/s → 70% 利用率): DDR5 為 DIMM 插槽設計,訊號需經過主機板走線,延遲較高。 - 記憶體控制器需要處理 row/column switching 與 precharge 等動作,造成帶寬利用率下降。 - 頻寬相對有限,STREAM 的測試 pattern 可能無法完全填滿通道,導致 idle 週期出現。 - 結果:51.2 GB/s 峰值下實際可達約 36 GB/s(約 70%)。 * FPGA + HBM2E(460 GB/s → 88% 利用率): HBM 採 2.5D/3D 封裝,記憶體與 FPGA die 透過矽中介層(interposer)直接連接 → 延遲顯著降低。 - 擁有多個獨立的 channel(例如 HBM2E 常有 8–16 個),能高度並行化資料讀寫,減少 pipeline stall。 - 控制器能以更接近「串流傳輸」(streaming)的方式驅動所有 channel,降低協定開銷比例。 - 結果:460 GB/s 峰值下實際可達約 405 GB/s(約 88%)。 ### 單核 IPC 接近物理極限 * IPC(Instructions Per Cycle)受限於指令級平行度 (ILP) 的可挖掘程度與分支預測精度 * 隨著製程微縮遇到功耗牆 ([Power Wall](https://en.wikipedia.org/wiki/Multi-core_processor#Technical_factors)) 與記憶體牆 ([Memory Wall](https://en.wikipedia.org/wiki/Memory_wall)),即使提高時脈,能效也急劇下降 * 新增執行單元或更深的管線帶來的收益遞減,反而增加延遲與功耗 * Intel Core i7-2600 → i7-13700K 單核 IPC 巔峰提升不到 30%,功耗卻從 95W 增到 125W(單核負載) * [Pollack’s Rule](https://en.wikipedia.org/wiki/Pollack%27s_rule):性能提升僅與電路複雜度的平方根成正比,而功耗幾乎與複雜度線性成長 * 美國國家研究委員會 (NRC) 報告〈[The Future of Computing Performance](https://nap.nationalacademies.org/catalog/12980/the-future-of-computing-performance-game-over-or-next-level)〉(2011 年) 指出單核性能成長已不可持續,必須轉向平行與多核處理器架構 > 在 2004 年之後,已無法透過提升單核 IPC 來提高 PPA ### 記憶體頻寬 / NoC latency / 快取一致性 * HPC 常遇到 memory-bound 工作負載,頻寬不足導致 CPU 或 GPU 閒置等待資料 * NoC(Network-on-Chip)延遲在多核間資料傳輸時佔據明顯比例 * 快取一致性協定(MESI、MOESI)需要額外封包交換,增加延遲與功耗 * 64 核 RISC-V SoC 模擬:NoC 延遲從 2ns 增至 8ns,LINPACK 性能下降 30% 推薦參照: * [自旋鎖大進化:為你的多核電腦量身打造高效同步術!](https://pretalx.coscup.org/coscup-2025/talk/7TLGQM/) * [從 CPU cache coherence 談 Linux spinlock 可擴展能力議題](https://hackmd.io/@sysprog/linux-spinlock-scalability) > LINPACK 做一些矩陣向量運算 ### PPA(Performance / Power / Area)取捨 * 三者之間存在取捨,改進其一可能犧牲另一項 * 擴寬記憶體匯流排、增加快取容量可提升性能,但會增加面積與功耗 * HBM 模組引入 → 性能上升 2.5 倍,但功耗增加 15% ### STREAM 測試 * STREAM 測量可持續記憶體頻寬 (Sustainable Memory Bandwidth) * 基於以下向量運算: 1. Copy: `a[i] = b[i]` 2. Scale: `a[i] = q * b[i]` 3. Add: `a[i] = b[i] + c[i]` 4. Triad: `a[i] = b[i] + q * c[i]` * 高度記憶體存取密集、計算量低 → 可直接觀察記憶體子系統瓶頸 ### RISC-V Vector Extension (RVV) 在 HPC 中的價值 * RVV 1.0(2023 年標準化):可變向量長度,適應不同資料集 * HPC 典型工作負載:矩陣乘法、FFT、向量加速 * 編譯器(LLVM/GCC)正在強化[自動向量化](https://gcc.gnu.org/projects/tree-ssa/vectorization.html) * OpenBLAS 在 RVV 上的移植:比純量提升 3–6 倍(依 VLEN 而定) > Vector Pipeline ### CHI 與多核一致性 * CHI([Coherent Hub Interface](https://developer.arm.com/documentation/ihi0050/latest/)):Arm 推出的快取一致性協定,支援多核處理器 / 多節點共享記憶體 * 香山([XiangShan](https://github.com/OpenXiangShan/XiangShan))專案是少數在開放原始碼 RTL 中實作 CHI 的案例 * CHI 模式下多核同步延遲可降低約 20% ### HBM 對 HPC 的改變 * DDR5 單通道頻寬約 51.2 GB/s * HBM2E 可達 460 GB/s(單堆疊),HBM3 接近 819 GB/s * HBM 採 2.5D 或 3D 封裝,距離短、延遲低、多通道並行 * 缺點:散熱挑戰大 ### NoC Latency 對性能的影響 * NoC 拓撲:mesh、torus、ring * 延遲從 1ns 到 10ns 對系統性能有顯著影響 * 64 核 RISC-V SoC 模擬:NoC 延遲從 2ns 增到 8ns,LINPACK 性能下降 30% > 完整的學習路徑圖:ISA -> 模擬器 -> RTL -> FPGA -> HPC Benchmark -> 多核擴展

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully