# 軟體分析與最佳化 ## 環境建立 出現問題 ### 複製貼上 sudo apt install open-vm-tools -y sudo reboot ### 磁碟不夠 切磁碟 lsblk sudo pvs sudo resize2fs /dev/sda2 ### 1.網路出現問題 sudo apt update很常出現 0%[working] ping -c 4 8.8.8.8 ping -c 4 archive.ubuntu.com 看能不能連線成功 如果不行 step 1:確認 VMware 網路模式 請先看一下你這台虛擬機在 VMware 裡的「網路設定」是什麼模式: 關掉 Ubuntu 虛擬機(或先暫停)。 打開 VMware → 選你的 Ubuntu 虛擬機 → 點「Edit virtual machine settings」。 在「Network Adapter」那一項裡面看: 常見三種模式: 模式 特性 是否能上網 NAT 共用主機網路 ✅ 建議使用這個 Bridged 直接接到實體網卡 ✅ 但可能需要防火牆允許 Host-only 只能主機和VM互通 ❌ 無法上網 👉 如果你看到 Host-only,請改成 NAT 模式。 然後務必重新開虛擬機!!!!!!!!!!! ### 2.用SCP 安裝SSH sudo apt install openssh-server -y 看狀態 sudo systemctl status ssh 啟用ssh sudo systemctl enable ssh sudo systemctl start ssh ## hw1: ### 1.把檔案丟到ubuntu裡面 scp "C:\Users\chuan\Downloads\dhrystone_v1.tar.gz" edward@192.168.6.129:~ ### 2.解壓縮檔案 tar -xzvf 613410058_hw1.tar.gz ### 3. 編譯程式 gcc -DUNIX dhry21a.c dhry21b.c timers_b.c -o myprog.exe -pg 沒有gcc 安裝gcc -> sudo apt install gcc ### 4. 安裝gprof pip install gprof2dot 沒有pip 安裝python3 -> sudo apt install python3-pip 會遇到要建立虛擬環境才能安裝gprof2dot的問題 用Anaconda -> wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh source ~/.bashrc 進入虛擬環境 or conda create -n prof_env python=3.10 -y conda activate prof_env ### 5.繼續操作 ./myprog.exe gprof ./myprog.exe gmon.out > gprof_output.txt gprof ./myprog.exe | gprof2dot -f prof | dot -Tpng -o call_graph.png 沒有dot 安裝gprahviz -> sudo apt install graphviz 東西整理完後壓縮 -> tar -czvf xxxxxxxxxxx.tar.gz xxxxx_hw1 ## hw2:sudo apt update ### 1.安裝locv gcc sudo apt install gcc sudo apt install build-essential lcov ### 2.更改編譯參數 (因為要求參數要是-O1) 進到practice1 目錄 nano Makefile 原本可能長這樣: FLAGS = -2 請修改成: FLAGS = -O1 --coverage 修改完成後,儲存檔案並退出編輯器 (在 nano 中是按 Ctrl+X,然後按 Y,再按 Enter)。 ### 3. 執行程式 make clean(清除暫存and原本的make) make ./dhry21 ### 4. 看分析報告 lcov --capture --directory . --output-file coverage.info genhtml coverage.info --output-directory coverage_report 打開coverage_report/index.html ## hw3:vtune ### 1.下載intel basetoolkit https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?packages=oneapi-toolkit&oneapi-toolkit-os=linux&oneapi-lin=offline 版本2024.2.1 如果版本用太新的話 CPU太新就沒辦法跑 chmod +x l_BaseKit_p_2024.x.x.xxxxx_offline.sh sudo sh ./l_BaseKit_p_2024.x.x.xxxxx_offline.sh ### 2.查看checker /opt/intel/oneapi/vtune/latest/bin64/vtune-self-checker.sh到bin64執行vtune-self-checker.sh ./vtune-self-checker.sh 看有沒有fail 或 success 我的部分是performance snapshot成功 hotspot fail #### 解決hotspot fail的問題 ![image](https://hackmd.io/_uploads/rymSSNIAgx.png) sudo sysctl --system 那個不用看 把你的虛擬機重新開機 然後再去看checker hotspot應該就能用了 ### 3.啟用環境變數 到這個資料夾 source /opt/intel/oneapi/setvars.sh ### 4.編譯程式 gcc: gcc -g -O2 chomp.c -o chomp_gcc intel compiler: icx -g -O2 chomp.c -o chomp_icx ### 4.執行vtune gui vtune-gui ## hw4 ### 1.gcc編譯 參數O2 -DUNIX gcc -o nsieve -O2 -DUNIX nsieve.c -fopt-info-vec-all ### 2.檢查報告 將報告輸出到 nsieve.gcc.optrpt 文件中 gcc -o nsieve -O2 -DUNIX nsieve.c -fopt-info-vec-all 2> nsieve.gcc.optrpt 檢查文件內容 cat nsieve.gcc.optrpt | less (按q離開) ### 3.intel compiler編譯 載入環境變數 source /opt/intel/oneapi/setvars.sh 進行編譯 icx -O2 -DUNIX -qopt-report -o nsieve nsieve.c ### 4.查看報告檔案nsieve.optrpt cat nsieve.optrpt ## Finalproject (sjeng) ### 1.載git git clone https://github.com/gcp/sjeng.git ### 2.進去到file裡面 cd sjeng ### 3.開腳本 autoreconf -f -i ./configure ### 4.Makefile改參數 (-g -pg -O0)(gcc) make clean ### 5. 執行程式 ./sjeng 會開始下棋 sudo apt install xboard xboard -> new -> go quit ### 6.固定工作負載 每次跑的時候 讓他固定跑8層(因為8~10層過後,時間消耗大幅上升) ./sjeng xboard new sd 8(深度設定10層) go ### 7.跑gprof gprof ./sjeng > sjeng_gprof_report.txt gprof ./sjeng | gprof2dot --output sjeng_callgraph.dot dot -Tpng sjeng_callgraph.dot -o sjeng_callgraph.png ### 8.用icx編譯 make clean make CC=icx CFLAGS="-g -pg -O3" LDFLAGS="-pg -lgdbm" ::: danger 在用icx編譯時有出現下方問題 ![image](https://hackmd.io/_uploads/SkgSv8Xxbg.png) ::: #### 1. 進去sjeng.c 在include下面增加 bool init_segtb(); #### 2. 進入segtb.c int load_2piece(void); int load_3piece(int w1_man, int b1_man, int b2_man, int table); ### 8. 數據 #### 1. gcc_-O0 (sd = 10) ![image](https://hackmd.io/_uploads/Bye2t4Xxbe.png) ![image](https://hackmd.io/_uploads/SkM6KN7xbg.png) ![sjeng_callgraph_gcc_O0](https://hackmd.io/_uploads/B1eO5aE7lZx.png) hotspots ![image](https://hackmd.io/_uploads/Hymg4SQe-x.png) performance snapshots ![image](https://hackmd.io/_uploads/SJBtKBQlZe.png) (sd = 8) ![image](https://hackmd.io/_uploads/SJ6ynrQx-e.png) ![image](https://hackmd.io/_uploads/SkkE2H7xbl.png) ![sjeng_callgraph_gcc_O0](https://hackmd.io/_uploads/BkTahHmxZe.png) ![image](https://hackmd.io/_uploads/H1IDpr7l-x.png) ![image](https://hackmd.io/_uploads/rk3oaH7lbg.png) #### 2. gcc_-O3 ![image](https://hackmd.io/_uploads/Bkf-Crmebx.png) ![image](https://hackmd.io/_uploads/H1SzAHQxWg.png) ![sjeng_callgraph_gcc_O3](https://hackmd.io/_uploads/S1zCW87ebx.png) ![image](https://hackmd.io/_uploads/B1UjXL7lZx.png) ## Finalproject 2 ### 1.掛載iso檔案 cd 進入 iso 檔的位置 sudo mount cpu2017-1.1.9.iso /mnt cd /mnt ./install.sh # 執行後輸入你要安裝的位置 sudo fuser -km /mnt # 如果出現 /mnt target is busy 就先執行這行 sudo umount /mnt ### 2.設定檔案 export SPEC=~/finalproject export PATH=$SPEC/bin:$PATH source ~/.bashrc cd $SPEC cp config/Example-gcc-linux-x86.cfg config/mytest_gcc.cfg nano config/mytest_gcc.cfg ![image](https://hackmd.io/_uploads/H1WtD48eZl.png) ![image](https://hackmd.io/_uploads/SyEcwVUgWe.png) ### 3.執行程式 編譯程式 runcpu --config=mytest_gcc --action=build --tune=base 631.deepsjeng_s 執行程式 runcpu --config=mytest_gcc --action=run --tune=base --size=train 631.deepsjeng_s 準備gmon的部分 cd benchspec/CPU/631.deepsjeng_s/run/run_base_train_mytest-m64.0000 gprof deepsjeng_s_base.mytest-m64 gmon.out > gprof_O0_report.txt gprof deepsjeng_s_base.mytest-m64 gmon.out | gprof2dot -f prof | dot -Tpng -o deepsjeng_O0_callgraph.png 檢視報告 less gprof_O0_report.txt 跑vtune 找到執行檔 在parameter的地方輸入 train.txt (如果是ref 就用ref.txt 正嘗試) 可以看到後台在做推算 ![image](https://hackmd.io/_uploads/HJUmP48lWe.png) -O3就是把base那邊改-O3 然後一樣的操作 但是檔案會被覆蓋掉 icx的部分: cp config/Examplexxxxxxxxx.cfg config/mytest_icx.cfg (xxxx填intel的) 跑之前要source /opt/intel/oneapi/setvars.sh ![image](https://hackmd.io/_uploads/BkWPQ6UlZl.png) ![image](https://hackmd.io/_uploads/ryivXTIlWe.png) runcpu --config=mytest_icx_O0 --action=build --tune=base 631.deepsjeng_s ### 問題(後來增加RAM解決16GB) nano mytest_gcc.cfg cd /home/edward/finalproject/ mv mytest_gcc.cfg config/ runcpu --config=mytest_gcc --action=build 631.deepsjeng_s ::: danger ![image](https://hackmd.io/_uploads/ry_C4OSlZl.png) ::: runcpu --config=mytest_gcc --size=ref --action=run --tune=base 631.deepsjeng_s(只跑base) ::: danger ![image](https://hackmd.io/_uploads/B12MHOBlbl.png) ::: ## Finalproject 實驗記錄input_data:ref ### 1.gcc_O0 ![image](https://hackmd.io/_uploads/BkXxsmqlZl.png) ![deepsjeng_O0_callgraph](https://hackmd.io/_uploads/S1IF2Q9xWe.png) hotspot ![image](https://hackmd.io/_uploads/rJS7HUnZ-g.png) flame graph ![image](https://hackmd.io/_uploads/BJt3SLnbbg.png) ### gcc_O0(-pg拿掉) 541 ![image](https://hackmd.io/_uploads/SJXZOGxzWe.png) ![image](https://hackmd.io/_uploads/r1_L0dhb-e.png) ![image](https://hackmd.io/_uploads/rJIt0Onb-g.png) ![image](https://hackmd.io/_uploads/BkQVgY3Wbg.png) thread1(059) ![image](https://hackmd.io/_uploads/rkj_iJkG-x.png) thread4(063) ![image](https://hackmd.io/_uploads/ryP7UlJf-x.png) thread8(065) ![image](https://hackmd.io/_uploads/Syq80l1G-x.png) ### gcc_O1(067) ![image](https://hackmd.io/_uploads/rJ10s-kG-l.png) ### gcc_O2(069) ![image](https://hackmd.io/_uploads/BJK5JGJfZx.png) ### 2.gcc_O3 ![image](https://hackmd.io/_uploads/SkifPNqeWl.png) ![image](https://hackmd.io/_uploads/S1gg_E9xWl.png) ![image](https://hackmd.io/_uploads/B1mvxt3-Zx.png) qsearch ![image](https://hackmd.io/_uploads/B1vQbtnZ-x.png) search search search ![image](https://hackmd.io/_uploads/H1BIWK3--e.png) ### gcc_O3(-pg拿掉)(055) 123 ![image](https://hackmd.io/_uploads/HkrQr5hWWg.png) ![image](https://hackmd.io/_uploads/HkawrcnZWe.png) qsearch ![image](https://hackmd.io/_uploads/H1gkU93Z-e.png) search search search ![image](https://hackmd.io/_uploads/r1wbUqn-bx.png) ![image](https://hackmd.io/_uploads/rkYvxzkMbg.png) ### 3.icx_O3 ![image](https://hackmd.io/_uploads/BJeSPJG-bg.png) ![deepsjeng_icx_O0_callgraph](https://hackmd.io/_uploads/rkZLd1M-bg.png) ### icx_O0(-pg拿掉)(075) 507 ![image](https://hackmd.io/_uploads/H1a3nZlM-x.png) ### icx_O1(-pg拿掉)(077) 301 ![image](https://hackmd.io/_uploads/SJliAbeGZx.png) ### icx_O2(-pg拿掉)(079) 301 ![image](https://hackmd.io/_uploads/B1BdgMxz-x.png) ### icx_O3(-pg拿掉)(081) 303 ![image](https://hackmd.io/_uploads/rJj_GzlMWl.png) ### icx_O3(-pg拿掉)(053) 123 ![image](https://hackmd.io/_uploads/HybMmOA-Ze.png) ![image](https://hackmd.io/_uploads/H1hbH_AW-l.png) ![image](https://hackmd.io/_uploads/rk_SrO0Wbe.png) qsearch ![image](https://hackmd.io/_uploads/Bka5Hu0b-g.png) search search search ![image](https://hackmd.io/_uploads/SyURSOAb-l.png) ### 4.icx_O3 ![image](https://hackmd.io/_uploads/ry_KiyzZWg.png) ![deepsjeng_icx_O3_callgraph](https://hackmd.io/_uploads/Bk9g3kMbbe.png) ## gprofng ### 進到目錄 cd dhrystone_v1 ### 編譯 gcc -DUNIX dhry21a.c dhry21b.c timers_b.c -o myprog.exe ### 數據採集 gprofng collect app -O result.er ./myprog.exe ### 叫出函數列表 gprofng display text -functions result.er ### 看call tree gprofng display text -calltree result.er ## FinalProject stage2看平行化、量測 CPI、Cache Miss 和 Branch Prediction ### CPI Cache Miss不可量 (vtune) 在用CPI Cache Miss 和 Branch Prediction的量測的時候 出現以下問題 ``` bash (base) edward@edward-VMware-Virtual-Platform:/opt/intel/oneapi/vtune/latest/sepdk/src$ sudo ./build-driver C compiler to use: [ /bin/gcc ] C compiler version: 13.3.0 Make command to use: [ /bin/make ] Make version: 4.3 Kernel source directory: [ /lib/modules/6.14.0-35-generic/build ] Kernel version: 6.14.0-35-generic Cleaning workspaces ... Done Building socperf driver ... warning: the compiler differs from the one used to build the kernel The kernel was built by: x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 You are using: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 Skipping BTF generation for socperf3.ko due to unavailability of vmlinux Done Building sep driver ... warning: the compiler differs from the one used to build the kernel The kernel was built by: x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 You are using: gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 In file included from /usr/src/linux-headers-6.14.0-35-generic/include/linux/module.h:22, from lwpmudrv.c:37: lwpmudrv.c:105:18: error: expected ‘,’ or ‘;’ before ‘INTEL_PMT’ 105 | MODULE_IMPORT_NS(INTEL_PMT); | ^~~~~~~~~ /usr/src/linux-headers-6.14.0-35-generic/include/linux/moduleparam.h:26:61: note: in definition of macro ‘__MODULE_INFO’ 26 | = __MODULE_INFO_PREFIX __stringify(tag) "=" info | ^~~~ /usr/src/linux-headers-6.14.0-35-generic/include/linux/module.h:301:33: note: in expansion of macro ‘MODULE_INFO’ 301 | #define MODULE_IMPORT_NS(ns) MODULE_INFO(import_ns, ns) | ^~~~~~~~~~~ lwpmudrv.c:105:1: note: in expansion of macro ‘MODULE_IMPORT_NS’ 105 | MODULE_IMPORT_NS(INTEL_PMT); | ^~~~~~~~~~~~~~~~ lwpmudrv.c:106:18: error: expected ‘,’ or ‘;’ before ‘INTEL_PMT_TELEMETRY’ 106 | MODULE_IMPORT_NS(INTEL_PMT_TELEMETRY); | ^~~~~~~~~~~~~~~~~~~ /usr/src/linux-headers-6.14.0-35-generic/include/linux/moduleparam.h:26:61: note: in definition of macro ‘__MODULE_INFO’ 26 | = __MODULE_INFO_PREFIX __stringify(tag) "=" info | ^~~~ /usr/src/linux-headers-6.14.0-35-generic/include/linux/module.h:301:33: note: in expansion of macro ‘MODULE_INFO’ 301 | #define MODULE_IMPORT_NS(ns) MODULE_INFO(import_ns, ns) | ^~~~~~~~~~~ lwpmudrv.c:106:1: note: in expansion of macro ‘MODULE_IMPORT_NS’ 106 | MODULE_IMPORT_NS(INTEL_PMT_TELEMETRY); | ^~~~~~~~~~~~~~~~ make[4]: *** [/usr/src/linux-headers-6.14.0-35-generic/scripts/Makefile.build:207: lwpmudrv.o] Error 1 make[4]: *** Waiting for unfinished jobs.... make[3]: *** [/usr/src/linux-headers-6.14.0-35-generic/Makefile:1997: .] Error 2 make[2]: *** [/usr/src/linux-headers-6.14.0-35-generic/Makefile:251: __sub-make] Error 2 make[1]: *** [Makefile:251: __sub-make] Error 2 make: *** [Makefile:243: default] Error 2 Failed to build the drivers ``` 內容顯示說 核心 API 不相容錯誤 在進行Microarchitecture Exploration時 由於虛擬化環境的限制和 Linux 核心與 VTune 取樣驅動程式的相容性問題 (6.14 kernel),無法成功取得硬體效能計數器 (PMC) 數據 ### 用perf來量 ==安裝== sudo apt install linux-tools-$(uname -r) sudo apt install linux-cloud-tools-$(uname -r) sudo apt install linux-generic ==執行== 給權限 sudo sysctl kernel.perf_event_paranoid=0 perf stat -e cycles,instructions,cache-misses,cache-references,branch-misses,branches -- \ ./deepsjeng_s_base.gcc_O0_nopg-m64 ref.txt memory prefetch probeTT ![image](https://hackmd.io/_uploads/ryAr83ombl.png) StoreTT ![image](https://hackmd.io/_uploads/H17KUho7-e.png) pawncpp ![image](https://hackmd.io/_uploads/H1fkvnoXWx.png)