# 2017q1 Homework5(matrix) contributed by <`rayleigh0407`> ## 開發環境 ```shell Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz Stepping: 3 CPU MHz: 800.000 CPU max MHz: 4200.0000 CPU min MHz: 800.0000 BogoMIPS: 8016.72 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 ``` ## 事前準備 1. 閱讀 [Matrix Multiplication using SIMD](https://hackmd.io/s/Hk-llHEyx) - 看到 _mm_mullo_epi32 的說明覺得很疑惑 , 查了一下才發現真正的 Operation 為下 ``` FOR j := 0 to 3 i := j*32 tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ENDFOR ``` - ### 參考資料 [Intel Intrinsics Guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE,SSE4_1,SSE4_2&expand=3726,3726&text=_mm_mullo_epi32) [# and ## in macros](http://stackoverflow.com/questions/4364971/and-in-macros) [The C Preprocessor: Concatenation - GCC](https://gcc.gnu.org/onlinedocs/cpp/Concatenation.html)